Add SWE-ZERO 12M dataset#258
Open
neubig wants to merge 4 commits into
Open
Conversation
Co-authored-by: openhands <openhands@all-hands.dev>
There was a problem hiding this comment.
🟡 Acceptable overall — pipeline is clean, CI passes, and the schema mapping is well-documented. Two issues need to be addressed before merge.
[RISK ASSESSMENT]
- [Overall PR]
⚠️ Risk Assessment: 🟢 LOW — adds a new dataset directory only; no shared schema or converter changes.
Was this automated review useful? React with 👍 or 👎 to this review to help us measure review quality.
Workflow run: https://github.com/neulab/agent-data-protocol/actions/runs/26894417354
This review was generated by an AI agent (OpenHands) on behalf of the reviewer.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #257.
Adds the
AlienKevin/SWE-ZERO-12M-trajectoriesdataset to ADP following the existing mini-swe-agent dataset patterns.Dataset source
trainFiles added
datasets/AlienKevin_SWE-ZERO-12M-trajectories/README.mddatasets/AlienKevin_SWE-ZERO-12M-trajectories/extract_raw.pydatasets/AlienKevin_SWE-ZERO-12M-trajectories/schema_raw.pydatasets/AlienKevin_SWE-ZERO-12M-trajectories/raw_to_standardized.pydatasets/AlienKevin_SWE-ZERO-12M-trajectories/metadata.jsondatasets/AlienKevin_SWE-ZERO-12M-trajectories/sample_raw.jsondatasets/AlienKevin_SWE-ZERO-12M-trajectories/sample_std.jsondatasets/AlienKevin_SWE-ZERO-12M-trajectories/sample_sft/openhands_v0.jsonSchema mapping summary
systemmessages because they define mini-swe-agent formatting and execution-free constraints.usertask messages toTextObservation(source="user").usermessages beginning withObservation:toTextObservation(source="environment")with the prefix stripped.bashblocks toCodeAction(language="bash"), preserving pre-command reasoning as the action description.MessageActionentries.instance_id,repo,trajectory_format,exit_status, andduration_secin trajectory details.Design decisions
Ambiguity: The source dataset has 100 independent rollouts per PR and repeats
instance_idacross rows.instance_idplus a deterministic SHA-1 content hash.rsteube__carapace-849becomes IDs such asrsteube__carapace-849-f3b732c7f08f.instance_idwould create duplicate sample IDs; adding a synthetic counter would depend on extraction position and be less stable.Ambiguity: The dataset card says most trajectories are incomplete and explicitly frames the corpus as mid-training data rather than verified SFT data.
Submittedonly.exit_status: incompleteare standardized and converted to OpenHands v0 SFT.Ambiguity: Raw observations are encoded as
usermessages prefixed withObservation:.Observation: ./example/cmd/_test/xonsh.pybecomes an environmentTextObservationcontaining./example/cmd/_test/xonsh.py.Ambiguity: Some assistant turns may not contain a valid bash block even though the prompt requests one.
MessageActionso malformed or terminal natural-language turns are preserved.Ambiguity: Assistant messages may contain reasoning before a bash command.
CodeAction.descriptionafter removing a leadingTHOUGHT:label.THOUGHT: I need to inspect files...becomes the code action description.THOUGHT:in descriptions adds format noise; discarding the reasoning loses useful supervision.Known limitations
Tests run
python -m pytest tests/test_dataset_structure.py tests/test_raw_schemas.py tests/test_standardized_schemas.py tests/test_std_to_sft_conversion.py -qPATH=/home/openhands/.local/bin:$PATH python -m pytest tests/ -qThis PR was created by an AI agent (OpenHands) on behalf of the user.
@neubig can click here to continue refining the PR